NMFk analysis: Geothermal data of Brady site, NV

This analysis demonstrates how NMFk can be applied to perform unsupervised machine-learning analysis.

The code below demonstrates the ML work for a submitted reseasrch paper analyzing geothermal data of Brady site, NV

Import required Julia modules

If NMFk is not installed, first execute import Pkg; Pkg.add("NMFk"); Pkg.add("DelimitedFiles"); Pkg.add("Gadfly"); Pkg.add("Mads").

Read Brady dataset

Setup the working directory containing Brady date

Load the datafile

Populate the missing wellnames

Set missing entries to be equal to zero

Define names of the data attributes (matrix columns; short names used for coding; long names used for plotting and visualization)

Define the attributes that will be processed

Index the attributes that will be processed

Output information about the processed data (min, max, count):

Get well locations and production

Define well types

Show information for different attributes

Colect the well data into 3D tensor with indices defining depths, attributes, and wells

Define the max depth of the data included in the analyses (750 m was selected)

Normalize the tensor slices assoicated with each attribute

Define problem setup variables

Plot well data

HTML file named "map/dataset-set00-v9-inv.html" is generated; it provides interacive visualization of the data.

PNG version of the map looks like this:

ML analysis

For the ML analyses, the data tensor will be flatten two differnt ways.

Flatten the tensor into a matrix (type 1)

Matrix rows merge the depth and attribute dimensions.

Matrix cols represent the well locations.

Perform NMFk analyses

Here the NMFk results are loaded from a prior ML runs.

As seen from the output the ML analyses identified that the optimal number of geothermal signatures in the dataset 6.

Solutions with a number of signatures less than 6 are underfitting.

Solutions with a number of signatures greater than 6 are overfitting and unacceptable.

The set of accetable solutions are defined as follows:

The accceptable solutions contain 2, 5 and 6 signatures.

Plot results

Plot representing solution quality (fit) and silhouette width (robistness) for different number of sigantures k:

The ML solutions containing 2, 5 and 6 signatures are further analyzed as follows:

Flatten the tensor into a matrix (type 2)

Matrix rows merge the depth and well locations dimensions.

Matrix cols represent the well attributes.

Perform NMFk analyses

Here the NMFk results are loaded from a prior ML runs.

As seen from the output the ML analyses identified that the optimal number of geothermal signatures in the dataset 3.

Solutions with a number of signatures less than 3 are underfitting.

Solutions with a number of signatures greater than 3 are overfitting and unacceptable.

The set of accetable solutions are defined as follows:

The accceptable solutions contain 2 and 3 signatures.

Plot results

Plot representing solution quality (fit) and silhouette width (robistness) for different number of sigantures k:

The ML solutions containing 2 and 3 signatures are further analyzed as follows: